Statistical machine translation について

Words near each other

・ Statistical finance
・ Statistical fluctuations
・ Statistical genetics
・ Statistical geography
・ Statistical graphics
・ Statistical hypothesis testing
・ Statistical inference
・ Statistical Institute of Catalonia
・ Statistical interference
・ Statistical Lab
・ Statistical language acquisition
・ Statistical learning in language acquisition
・ Statistical learning theory
・ Statistical Lempel–Ziv
・ Statistical literacy
・ Statistical machine translation
・ Statistical manifold
・ Statistical map
・ Statistical mechanics
・ Statistical Methods for Research Workers
・ Statistical Methods in Medical Research
・ Statistical model
・ Statistical murder
・ Statistical noise
・ Statistical Office of Slovenia
・ Statistical parameter
・ Statistical parametric mapping
・ Statistical parsing
・ Statistical physics
・ Statistical population

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Statistical machine translation ：ウィキペディア英語版

Statistical machine translation
Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrasts with the rule-based approaches to machine translation as well as with example-based machine translation.
The first ideas of statistical machine translation were introduced by Warren Weaver in 1949,〔W. Weaver (1955). Translation (1949). In: ''Machine Translation of Languages'', MIT Press, Cambridge, MA.〕 including the ideas of applying Claude Shannon's information theory. Statistical machine translation was re-introduced in the late 1980s and early 1990s by researchers at IBM's Thomas J. Watson Research Center and has contributed to the significant resurgence in interest in machine translation in recent years. Nowadays it is by far the most widely studied machine translation method.
==Basis==

The idea behind statistical machine translation comes from information theory. A document is translated according to the probability distribution

p(e|f)

that a string

e

in the target language (for example, English) is the translation of a string

f

in the source language (for example, French).
The problem of modeling the probability distribution

p(e|f)

has been approached in a number of ways. One approach which lends itself well to computer implementation is to apply Bayes Theorem, that is

p(e|f) \propto p(f|e) p(e)

, where the translation model

p(f|e)

is the probability that the source string is the translation of the target string, and the language model

p(e)

is the probability of seeing that target language string. This decomposition is attractive as it splits the problem into two subproblems. Finding the best translation

\tilde

is done by picking up the one that gives the highest probability:
:

\tilde = arg \max_ p(e|f) = arg \max_ p(f|e) p(e)

.
For a rigorous implementation of this one would have to perform an exhaustive search by going through all strings

e^*

in the native language. Performing the search efficiently is the work of a machine translation decoder that uses the foreign string, heuristics and other methods to limit the search space and at the same time keeping acceptable quality. This trade-off between quality and time usage can also be found in speech recognition.
As the translation systems are not able to store all native strings and their translations, a document is typically translated sentence by sentence, but even this is not enough. Language models are typically approximated by smoothed ''n''-gram models, and similar approaches have been applied to translation models, but there is additional complexity due to different sentence lengths and word orders in the languages.
The statistical translation models were initially word based (Models 1-5 from IBM Hidden Markov model from Stephan Vogel〔S. Vogel, H. Ney and C. Tillmann. 1996. HMM-based Word Alignment in StatisticalTranslation. In COLING ’96: The 16th International Conference on Computational Linguistics, pp. 836-841, Copenhagen, Denmark.〕 and Model 6 from Franz-Joseph Och〔F. Och and H. Ney. (2003). A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1):19-51〕), but significant advances were made with the introduction of phrase based models.〔P. Koehn, F.J. Och, and D. Marcu (2003). Statistical phrase based translation. In ''Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT/NAACL)''.〕 Recent work has incorporated syntax or quasi-syntactic structures.〔D. Chiang (2005). A Hierarchical Phrase-Based Model for Statistical Machine Translation. In ''Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05)''.〕

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Statistical machine translation」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース